AITopics | multi-player bandit

Collaborating Authors

multi-player bandit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Distributed Multi-Player Bandits - a Game of Thrones Approach

Neural Information Processing SystemsNov-20-2025, 22:57:21 GMT

We consider a multi-armed bandit game where N players compete for K arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed. Performance is measured using the expected sum of regrets, compared to the optimal assignment of arms to players. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible. We present a distributed algorithm and prove that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right). This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario. All other works have assumed that either all players have the same vector of expected rewards or that communication between players is possible.

multi-player bandit, name change, throne approach, (3 more...)

Neural Information Processing Systems

Industry:

Media > Television (0.44)
Leisure & Entertainment (0.44)

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

Reviews: Distributed Multi-Player Bandits - a Game of Thrones Approach

Neural Information Processing SystemsOct-8-2024, 02:53:09 GMT

This paper studies a distributed multi-armed bandits setting, where every round each of N players must choose one of K arms. If a player picks the same arm as some other player, they receive payoff zero; otherwise, they receive payoff drawn from some distribution (specific to that player and that arm). This can be thought of as learning a distributed allocation or matching from players to arms. The goal is to design a communication-free learning algorithm that maximizes the total overall utility of all the players (or alternatively, minimizes their regret with respect to the best fixed allocation). The authors design an algorithm which receives total regret of O(log 2 T).

algorithm, allocation, multi-player bandit, (6 more...)

Neural Information Processing Systems

Industry:

Media > Television (0.42)
Leisure & Entertainment (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

A survey on multi-player bandits

Boursier, Etienne, Perchet, Vianney

arXiv.org Artificial IntelligenceNov-29-2022

Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2211.16275

Country:

Asia > Middle East > Jordan (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre:

Overview (0.48)
Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Multitask Bandit Learning through Heterogeneous Feedback Aggregation

Wang, Zhi, Zhang, Chicheng, Singh, Manish Kumar, Riek, Laurel D., Chaudhuri, Kamalika

arXiv.org Machine LearningOct-29-2020

Online multi-armed bandit learning has many important real-world applications (see Villar et al., 2015; Shen et al., 2015; Li et al., 2010, for a few examples). In practice, a group of online bandit learning agents are often deployed for similar tasks, and they learn to perform these tasks in similar yet nonidentical environments. For example, a group of assistive healthcare robots may be deployed to provide personalized cognitive training to people with dementia (PwD), e.g., by playing cognitive training games with people (Kubota et al., 2020). Each robot seeks to learn the preferences of its paired PwD so as to recommend tailored health intervention based on how the PwD reacts to and is engaged with the activities (as captured by sensors on the robots) (Kubota et al., 2020). As PwD may have similar preferences and may therefore exhibit similar reactions, one natural question arises--can the robots as a multi-agent system learn to perform their respective tasks faster through collaboration? In this paper, we develop multi-agent bandit learning algorithms where each agent can robustly aggregate data from other agents to better perform its respective task. We generalize the the multi-armed bandit problem (Auer et al., 2002) and formulate the ɛ-Multi-Player Multi-Armed Bandit (ɛ-MPMAB) problem, which models heterogeneous multitask learning in a multi-agent bandit learning setting. In an ɛ-MPMAB problem instance, a set of M players are deployed to perform similar tasks--simultaneously they interact with a set of actions/arms, and for each arm, different players receive feedback from similar but not necessarily identical reward distributions. In the above assistive robotics example, each player corresponds to a robot; each arm corresponds to one of the cognitive activities to choose from; for each player and each arm, there is a separate reward distribution which reflects a PwD's

artificial intelligence, bandit, data mining, (19 more...)

arXiv.org Machine Learning

2010.1539

Country: